The area under the ROC curve as a measure of clustering quality
نویسندگان
چکیده
The Area Under the Receiver Operating Characteristics (ROC) Curve, referred to as AUC, is a well-known performance measure in supervised learning domain. Due its compelling features, it has been employed number of studies evaluate and compare different classifiers. In this work, we explore AUC unsupervised domain, more specifically, context cluster analysis. particular, elaborate on use an internal/relative clustering quality, which refer Curve for Clustering (AUCC). We show that AUCC given candidate solution expected value under null model random solutions, regardless size dataset and, importantly, or (im)balance clusters evaluation. addition, fact that, validation consider, actually linear transformation Gamma criterion from Baker Hubert (1975), also formally derive theoretical chance clusterings. discuss computational complexity these criteria while ordinary implementation can be computationally prohibitive impractical most real applications analysis, equivalence with unveils much efficient algorithmic procedure. Our findings are supported by experimental results. These results addition effective robust quantitative evaluation provided AUCC, visual inspection ROC curves themselves useful further assess broader, qualitative perspective well.
منابع مشابه
The Area under the ROC Curve as a Criterion for Clustering Evaluation
In the literature, there are several criteria for validation of a clustering partition. Those criteria can be external or internal, depending on whether we use prior information about the true class labels or only the data itself. All these criteria assume a fixed number of clusters k and measure the performance of a clustering algorithm for that k. Instead, we propose a measure that provides t...
متن کاملBoosting the Area under the ROC Curve
We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. We further show that this boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
متن کاملArea under the curve as a measure of discounting.
We describe a novel approach to the measurement of discounting based on calculating the area under the empirical discounting function. This approach avoids some of the problems associated with measures based on estimates of the parameters of theoretical discounting functions. The area measure may be easily calculated for both individual and group data collected using any of a variety of current...
متن کاملthe investigation of the relationship between type a and type b personalities and quality of translation
چکیده ندارد.
Estimation of the area under the ROC curve.
The area under the receiver operating characteristic curve is frequently used as a measure for the effectiveness of diagnostic markers. In this paper we discuss and compare estimation procedures for this area. These are based on (i) the Mann-Whitney statistic; (ii) kernel smoothing; (iii) normal assumptions; (iv) empirical transformations to normality. These are compared in terms of bias and ro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Data Mining and Knowledge Discovery
سال: 2022
ISSN: ['1573-756X', '1384-5810']
DOI: https://doi.org/10.1007/s10618-022-00829-0